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Abstract 

In this work, we consider a class of differentiable criteria for sparse image computing 
problems, where a non-convex regularization is applied to an arbitrary linear transform of 
the target image. As special cases, it includes edge preserving measures or frame analysis 
potentials commonly used in image processing. As shown by our asymptotic results, the 
^2 — (o penalties we consider may be employed to provide approximate solutions to £q- 
pcnalized optimization problems. One of the advantages of the proposed approach is that it 
allows us to derive an efficient Majorize-Minimize subspace algorithm. The convergence of 
the algorithm is investigated by using recent results in non-convex optimization. The fast 
convergence properties of the proposed optimization method are illustrated through image 
processing examples. In particular, its effectiveness is demonstrated on several data recovery 
problems. 



A preliminary version of this work has been presented in [18] . 
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The objective of this paper is to show that, for a wide range of variational problems in image 
processing, an estimation x £ M. N of the target image can be efficiently obtained by using a 
class of non-convex, regularizing criteria that promote sparsity. More specifically, we focus on 
the following penalized optimization problem: 

minimize (Fg(x) = 3>(Hx — y) + ^(a;)) , (1) 
x&R N 

where H / is a matrix in R QxN , y is a vector in R Q , $: R Q -> R and f j: -> R 
are functions, and 5 is a positive scalar. We are mainly interested in the case when is a 
differentiable function. This includes the classical squared Euclidean norm. The problem then 
reduces to a penalized least squares (PLS) problem \5A\ 155] . Another case of interest is when 
<3? is the separable Huber function [31] Example 5.4] which is useful for limiting the influence of 
outliers in some observed data. Other examples shall be mentioned subsequently. 

Note that the considered optimization problem is frequently encountered in the field of 
inverse problems. Then, y is some vector of observations related to the original image x £ R N 
through a linear model of the form 

y = Hx + w. (2) 

where H models the measurement process (e.g. a convolution operator or a projection operator), 
w is an additive noise vector, $> is a data-fidelity term and ^$>s is a regularization term. 

An efficient strategy to promote images formed by smooth regions separated by sharp edges, 
is to use regularization functions of the form 

S 

(Vx£R w ) *i(aO = Y / ^sAW v sX-c s \\) + \\V x\\ 2 , (3) 

8=1 

where || • || denotes the Euclidean norm, and, for every s £ {1, . . . , S}, c s £ R Ps , £ R PsXN 
and tp s s ■ R — > R- An important example of such a framework is when, for every s € {1, . . . , S}, 
P s = 1 and c s = 0, and V = { V, T , s £ {1, . . . , 5}} C R N constitutes a frame of R N , leading to 
a so-called frame- analysis regularization [23|. For every s £ {1, . . . , S 1 }, V s may also be a matrix 
serving to compute discrete gradients (or higher-order differences), useful for edge preservation. 
In particular, if S = N and, for every s € {1, . . . , N}, P s = 2, c s = and V, = [Aj AJ] t where 
£ R N (resp. £ R N ) corresponds to a horizontal (resp. vertical) gradient operator, and 
(Vi £ M) ijj s g(t) = X\t\ with A > 0, the first term in the right hand side of (|3|) corresponds to a 
discrete version of the isotropic total variation semi- norm |53j . Note that other choices of 
lead to different penalization strategies. For instance, one can use nonlocal mean regularization, 
which has been recently studied in the context of edge preserving functions in [48] . 

In order to preserve significant coefficients in V, one may require the functions (ip s ,s)i<s<s to 
have a slower-than-parabolic growth, as this limits the cost associated with these components. 
Two of the main families of such functions known in the literature are: 

(i) £2 — ^\ functions, i.e. convex, continuously differentiable, asymptotically linear functions 
with a quadratic behavior near [U [161 1371 161] . Typical examples are the functions 
(Vs £ {1, ... , S}) (yt £ R) ip SiS (t) = AVi 2 + 6 2 with A > 0. In the limit case when 5^0, 
the classical l\ penalty is obtained. 

(ii) £2 — (-0 functions, i.e. asymptotically constant functions with a quadratic behavior near 
[271 USB S3 [53 [60]. Typical examples are the truncated quadratic functions (Vs £ 
{1,...,5}) (Vt £ R) tp s ,s(t) = Amin(t 2 /(25 2 ),1) with A > 0. When 6 -> 0, an £ penalty 
is obtained. 



The last quadratic penalty term x \— > ||Voa;|| 2 in ([3]) plays a role similar to the elastic nei! 
regularization introduced in |62j . It allows us to guarantee some properties of the minimizers 
and minimization algorithms, when H is not injective (e.g. an ideal low-pass filtering operator). 

The £2 — £0 approach has been shown in the literature to be advantageous in many appli- 
cations, for instance sparse component analysis |44j . compressive sensing [32J, matrix comple- 
tion [¥T], robust regression [12], segmentation |51| . and image recovery [2Q|, 148] . This paper 
mainly addresses the latter problem, where £2 — £0 is recognized for its ability to preserve edges 
between homogeneous regions [35]. The non-convexity and sometimes non-differentiability of 
the potential function lead however to a difficult optimization problem. In this paper, we con- 
sider a class of non-convex differentiable potential functions, which can be viewed as smoothed 
versions of a truncated quadratic penalty function. 

An effective approach for the minimization of differentiable criteria is to consider a sub- 
space descent algorithm [23, 61j. For such methods, at each iteration, a stepsize vector allowing 
an optimized combination of several search directions is computed through a multidimensional 
search. Recently, an original stepsize strategy based on a Majorize-Minimize (MM) recursion 
was introduced in [17] , This latter approach leads to a closed- form algorithm whose practi- 
cal efficiency has been demonstrated in the context of image restoration, when using convex 
penalized least squares criteria. 

Our main contributions in this paper are: 

• to establish conditions under which a solution to an £q penalized criterion can be asymp- 
totically obtained by using the considered class of penalty functions; 

• to extend the approach in [T7] to non necessarily convex minimization problems of the 
form (TjQ); 

• to provide a proof of convergence of the iterates of the subspace MM algorithm; 

• to show the good practical performance of the proposed method on several applications. 

It must be stressed that the convergence proofs in this paper rely on recent results emphasizing 
the prominent role played by the Kurdyka-Lojasiewicz inequality [3j HI [5j [10] in the study of 
the convergence of various iterative optimization methods. Our results constitute a significant 
improvement over those in [T7]. In this previous article, the analysis was restricted to showing 
that the gradient of the objective function converges to zero. 

The rest of the paper is organized as follows: properties of the considered optimization 
problem are first investigated in section [2J Then, we introduce in section a minimization 
strategy based on an MM subspace scheme. In section U we investigate the general convergence 
properties for the proposed algorithm. Finally, section [5] illustrates the performance of our 
algorithm through a set of comparisons and experiments in image processing. 

2 Considered class of objective functions 

In this section, we briefly mention some useful properties of Problem (UJ. 
2.1 Existence of a minimizer 

First, we provide a preliminary result concerning the existence of a solution to the problem 
under the following assumption on the functions in ([1]) and on the nullspaces Ker H and Ker Vo 
of H and Vq, respectively: 



Assumption 1. (i) is continuous and coercive (that is lim|| z ||_ >+00 $(z) = +00). 

(ii) For every 5 > and s E {1, . . . , S}, i/j Sj s is continuous and takes nonnegative values. 

(iii) KerifnKerVo = {0}. 

Proposition 1. Suppose that Assumption^ holds. Then, for every 5 > 0, 

(i) Fs is coercive; 

(ii) the set of minimizers of Fs is nonempty and compact. 
Proof. Let 5 > 0. Since, for every s E {1, • • • , S}, tp Sj $ > 0, we have 

(Va; E R^) F{(aj) > <&{Hx - y) + \\V x\\ 2 = F(x). 
This implies that, for every r\ E M, 

lev< r/ Fj = {x G | F 5 (a;) < 77} C lev<„ £. 

w and 77 E 



(4) 



As $ is continuous and coercive, inf $ > —00. For every x E 
then 

$(Hx — y)<r) 



(5) 

if a? E lev^F, 

(6) 
(7) 



Then, as a consequence of ([ 
a; E lev^F, 

The combination of ([7]) and 
||Aa:|| < £' where 



|| Vbcc || < t) - inf 
and the coercivity of $, there exists C > such that, for every 

|| if as || < C- (8) 
D shows that there exists £' > such that, for every x E lev<^F, 



if 

V 



It can be deduced that, for every x E lev^Fn (Ker A)~ 

v\\x\\ < (' 



(9) 



(10) 



where y_ is the minimum non-zero singular value of A (the existence of which is guaranteed 
since A 7^ 0). In addition, Ker A = Ker if n Ker Vo = {0}, which implies that (Ker A) -1- = 
M. N . Hence, F is a level-bounded function, that is, for every 77 E M, lev< T?; F is bounded 
(and possibly empty). Using ([5]), we can conclude that Fs is a level-bounded function (or 
equivalently, it is coercive [52j Proposition 11.11]). As F§ is also continuous, (ii) follows from 
[521 Theorem 1.9]. □ 



Remark 1. (i) In the particular case when if is injective, Assumption { ffl iii ) is satisfied 
if Vq = 0. The injectivity of if obviously holds when if = I in ([2]) ; which typically 
corresponds to denoising applications. 

(ii) When Vq = 0, the existence of a minimizer of Fs with 5 > can also be guaranteed 
under other useful conditions. For example, this property holds under Assumptions {Wi)\ 
andj Wti) , if Ker if n Ds=i -^ er ^ = i®}> an< ^ w ^ en f or every s E {1, . . . , S}, s (0) is a 
nonempty bounded set. 



2.2 Non-convex regularization functions ^ 

In the remainder of this work, we will be interested in potentials satisfying the following addi- 
tional property: 

Assumption 2. (i) (Vs G {1,...,S}) (V($i,<$ 2 ) G (0,+oo) 2 ) ft < 5 2 => (Vt G R) ^ S)5l (i) > 

(ii) There exists A > such that 

(Vs € {1, ... , S})(\/t G R) lim^ J i(t) = AxR\{o}(t) (H) 

<5>0 



w/iere Xr\{0}(*) 



ift = 

1 otherwise. 



The latter condition shows that a binary penalty function is asymptotically obtained. Ex- 
amples of functions ips,& with s G {1, . . . , 5 1 } and 5 > satisfying Assumptions D P(ii)| and [2] are 
provided below: 



Example 2. (i) Truncated quadratic potential 

(Vt G M) ^(i) = A min Q^, 1^ , A > 0. 

(ii) Geman-McClure potential f28\/ : 



Xt 2 

(vt g R) = 262 + t2 , A > 0. 



(iii) Welsch potential flH]/ : 



t 2 



(Vt G R) ^ )<5 (t)=A(l-exp(-^)) 5 A>0. 



(iv) Hyberbolic tangent potential: 



(Vt G R) ^ s , s (t) = A tanh ) . A > °- 
(v) Tukey biweight potential f^: 

<W 6 R) ^(t) = ( A ( 1 -( 1 -^ 3 ) » , A>0. 

[ A otherwise 

The latter four functions are such that tps,s(t) ~ At 2 /(2<5 2 ) as t ->■ 0. They can thus be 
viewed as smoothed versions of the one- variable truncated quadratic function in Example E ffii)| 
(see Fig. p. 
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Figure 1: Truncated quadratic penalty in Exampl e [^i)| (black, full) and its smooth approximations 
ip s .s(t) as defined in Examples Q^ii)| (red, dashed), l ^iii) (blue, dash-dot), | ^iv) (green, dot), and ffivj| 
(magenta, cross), for parameters A = 1 and 5 = 1. 



2.3 Asymptotic convergence to £q criterion 

The asymptotic behavior of the considered class of potentials can now be derived by showing 
the epi-convergence of F$ to the following block (or group) ^o-P enai i ze d objective function: 



F Q : x h> $(Hx -y) + X£ (Vx - c) + \\V x\ 



(12) 



where V = [V-^ 
1251 defined as 



Vs T ] 



(Vt=[t],...,t J s ] J e 



where, for every s G {1, . . . , S}, t s G BL Ps 
expression of the £q cost of R s . 



> c s\ 



When P 



and £q denotes the so-called 'block £q cost' 



£o(t) 



s 

£ 

s=l 



Ps = l, 



\ts\\), (13) 
13]) provides the standard 



Proposition 2. Suppose that Assumptions^ and{j^hold. Let (5 n )neN be a decreasing sequence 
of positive real numbers converging to 0. Then, 

(i) inf Fg n — > inf Fq as n — >■ +oo. 

(ii) // (Vn G N) £C n is a minimizer of Fg n , then the sequence (x n ) nS N is bounded and all its 
cluster points are minimizers of Fq . 

(iii) If Fq has a unique minimizer x, then x n — > x as n — >■ +oo. 

Proof. First, note that, according to Assumption [^p)| for every n G N, Fg n+1 > F$ n . In addition, 
for every n G N, is a continuous function as a consequence of Assumptions Q [p)| and D P(ii)| 
Then it can be deduced from [52], Theorem 7.4(d)] that (Fg n ) n& ^ epi-converges to sup ngN F5 n . 
The latter function is equal to Fq by virtue of Assumption l ^ii)] In addition, (Fg n ) n& ^ is 
eventually level-boundecul as a consequence of [52j Ex. 7.32(a)], the lower bound in ((H) and the 



(fi„)neNis eventually level-bounded if, for every r\ £ R, there exists some subset M of N such that N \ M is 
finite and U„eM lev<^ F$ n is bounded. 



fact that F_: x t— > Q(Hx — y) + || Voas|| 2 is level-bounded (as shown in the proof of Proposition [TjK 
We complete the proof by noticing that Fq is lower semicontinuous and proper, and by applying 
[23 Theorem 7.33]. □ 

The above proposition guarantees that a minimizer of Fq can be well-approximated by 
choosing a small enough 5. Note that the existence/uniqueness of a minimizer of Fq is discussed 
in the literature on compressed sensing under some specific assumptions [HI HH [22] . 

We will now turn our attention to numerical methods allowing us to efficiently solve Problem 
([!]) when all the involved functions are smooth. 



3 Proposed optimization method 

3.1 Subspace algorithm 

A classical strategy to minimize the criterion F$ consists of building a sequence (x k ) k( z^ of M. N 
such that 

(VA G N) F s (x k+1 ) < F s (x k ). (14) 

This can be performed by translating the current solution x k at each iteration k € N along a 
suitable direction d k £ M. N : 

Xk+i = x k + a k d k , (15) 

where a k > is the stepsize, and d k is a descent direction. When F$ is differentiable, this 
direction is chosen such that gld k < where g k denotes the gradient of F$ at x k . 

A significant practical improvement regarding the convergence rate is achieved by performing 
subspace acceleration, i.e. by considering a set of M search directions {d\, . . . , d^ 1 } C M. N and 
by defining the new iteration as 

x k+1 = x k + D k u k , (16) 

where D k = [d\, . . . , d¥] G K ffxM is the search direction matrix and u k S M. M is a multivariate 
stepsize, which is computed so as to minimize 

fk,s- F s (x k + D k u). (17) 

The memory gradient subspace algorithm, initially proposed in the late 1960's by Miele and 
Cantrell [35] . corresponds to: 

(Vfc>l) D k = [-g k | x k - x k - x \. (18) 

When the objective function is quadratic, this algorithm is equivalent to the linear conjugate 
gradient algorithm [T5]. More recently, several other subspace algorithms have been proposed, 
where, at each iteration k S N, D k usually includes a descent direction (e.g. gradient, Newton, 
truncated Newton directions) and a short history of previous directions (see [171 Tab.l] for a 
general review). 

In addition, the subspace scheme (|16j) was shown to outperform standard descent algorithms 
such as nonlinear conjugate gradient over a set of PLS minimization problems in |1T|. 16 lj . The 
convergence of Algorithm (jT6j) however requires the design of a proper strategy to determine 
the stepsizes (u k ) ke fq, which we discuss in the next section. 



3.2 Majorize-Minimize stepsize ° 

At each iteration feel, the minimization of f k g using the Majorization-Minimization (MM) 
principle is approximately performed by successive minimizations of tangent majorant functions 
for f k)S . Let q k : R M x R A/ ->• R and let u' G R A/ . The function q k (.,u') is said to be a tangent 
majorant for f k) g at «' if 

f(V«GR M ) q k (u,u') > f ktS (u) 
\q k {u',u') = f k ,s( u ')- 

From this point forward, we assume that f k g is differentiable. Following [17] . we propose to 
employ a convex quadratic tangent majorant function of the form: 

(Vu G R A/ ) g fc (u,«') = f k , s (u') + V/ fc ,{(u') T (« - «') + |(« - u') T Bk,„'(« - «'). ( 20 ) 

where ^7fk,s( u ') denotes the derivative of 5 at u' , and -Bfc.u' is an x symmetric positive 
semi-definite matrix that ensures the fulfillment of majorization properties (|19p. The initial 
minimization of f k g is replaced by a sequence of easier subproblems, corresponding to the 
following MM update rule: 



r 4 = o, 



Vj G {1, J} 

tt( G Argmin q k (u,u{~ ) 



(21) 



Note that for M = 1, this reduces to the scalar MM line search [36]. 

3.3 Construction of the majorizing approximation 

We now make the following assumption: 

Assumption 3. (i) $ is differentiable with an L-Lipschitzian gradient, i.e. 

(Vz G R Q )(Vz' G R Q ) ||V$(z) - V$(z')|| < L\\z - z'\\. (22) 

(ii) For every s G {1, • • • ,S}, ip St $ is a differentiable function. 

(iii) For every s G {1, • • • , S}, ip s ,s(^) i> s concave on [0, +oo). 

(iv) For every s G {1, • • • ,5}, there exists G [0, +oo) such that (Vi G (0, +oo)) < ip s ,6(t) < 
JJst where ip s s is the derivative of ip s $. In addition, lim^o V's s{t)/t £ 



We emphasize the fact that Assumptions E ^ii) (iv) hold for the £2-^0 penalties in Exam 



ples[ ffil) (v) Morever, Tab. [T] presents several examples of functions fulfilling Assumption [ ffii)j 



By defining 

(VsG{l,...,5})(VtGR) u 8iS (t) = i> s , S (t)/t, (23) 

(the function co s § is extended by continuity at 0), a tangent majorant can be built as described 
below: 



Function name 




Lipschitz 




Z = (Zq)l<q<Q € R Q 


constant L 


Least squares 


\z T kz 

A G M^ x< 5 symmetric positive semi-definite 


l|A|| 


£ 2 - h 




maxi< a < Q (-i=) 


[58] 


Nt G R) 6 n (t) = si On + t 2 Or, > 




Huber 


— 1 Ty v y/ 


2 maxi< <n p a 


m 


, x 1 A?* 2 if 1*1 < v a 

yteR)<t> q (t) = r q - 9 

p Q v a \l \t\ — v a \) it m > v g 

i r y y \ i i yi/ ii y 
V q > 0, p q > 




Cauchy 


/ ^g— l ' y \ y/ 


maxi< r7 <nf — ) 


m 


(Vt G R) <A o (0 = ln(p„ + f 2 ), p„ > 




Squared distance to 


\# B {z) 


1 


a closed convex set B [6] 






Smoothed max [7] 


P ln (E?-i e^ /p )> p> 


Vp 


Inf-convolution 


inf 2l+Z2=z $i(zi) + ^2(22) 


P 


m 


$1 g r (R Q ), $2 g r (R Q ) 

$2 p-Lipschitz differentiable, p > 0, 
such that lim|| z ||_ > . +00 = +°° 





Table 1: Some examples of functions <E> with an L-Lipschitzian gradient. (||A|| denotes the spec- 
tral norm of A and ro(M^) denotes the class of proper lower-semicontinuous convex functions 
from to (—00, +00].) 



Lemma 1. lj For every x G R , let 

A(x) = pH T H + 2V r V + F T Diag {b(x)} V, (24) 
where p G [L, +00) and b(x) = (bi(x)) 1<i<SP G R SP with P = Yl s =i ^ s * s suc ^ that 

(Vs G {1, . . . , S}) (Vp G {1, . . . , P s }) b Pl+ ... +Pa _ 1+p (x) = uj S:5 (\\V s x - c s \\). (25) 
Let v! G R M and k G N. Then, under Assumption^ q k (-,u') with 

B Ku , = DjA{x k + D k u')D k , (26) 
is a convex quadratic tangent majorant of fs t k a * u - 

Hence, according to (|20p and ()2ip . the optimality condition for the choice of the stepsize in 
the MM iteration is given by: 

(Vfc G N)(Vj G {1, . . . , J}) B k ^-i (u{ - uf 1 ) + V/ M (uf x ) = 0. (27) 

This yields the explicit stepsize formula 

< = < 1 " B-^_ a V/^uJ" 1 ), (28) 



jA/xAf 



. One of the main advantages of 



where B 1 , . , is the pseudo-inverse of B, ,-i 

approach is that the computational cost of the required inversion is low, provided that the 
number M of search directions remains small. The resulting MM subspace algorithm reads 



x £ 

\/k g N 



u 



o 



o, 

Vj G {1, 



i 

U 



J} 

Dj A(x k + D k u{- l )D k , 



(29) 



j'-i 



4 Convergence result 

We first provide some preliminary technical lemmas before stating our main convergence result. 
In the following, for every k G N and j € {0, . . . , J}, we define 



x{ = x k + D k u 3 k , 
9i = VF 5 (x{), 



(30) 
(31) 



(thus, x k = x k+ \ and g k = <7fc+i). Moreover, we assume that the set of directions (D k ) k ^ 
fulfills the following condition: 

Assumption 4. For every k €N, the matrix of directions D k is of size N xM with 1 < M < N 
and the first subspace direction d\ is gradient-related i.e., 



gjd\ < -7oll0fcH 2 , 
\\d\W < 7i||fffe||) 



(32) 
(33) 



with 70 > and 71 > 0. 



As emphasized in [HI Sec. 1.2] and [T71 Sec.III-D], conditions ([32]) and ([33]) hold for a large 
family of descent directions, such as the steepest descent direction or the truncated Newton 
direction. 



4.1 Preliminary results 



Lemma 2. Under Assumptions^ and\4\ there exists a constant v > such that, for every 
keNandje{l,..., J}, F 5 (x k ) - F & {x{) > tv^Wgkf . 



Proof. According to Assumption [ ^iv) and Eq. ([23]) . for every s G {1, ••• ,S}, uj St s is upper- 
bounded on (0, +00). Hence, there exists v > such that, for every x G M. N and v E R-^, 
v T A(x)v < v\\v\\ 2 /2. The result then follows from [T71 Theorem 1]. □ 



Lemma 3. Under Assumptions^ and\^ the MM subspace iterates are such that 

1 
2 



( V fc G N)(Vj G {0, . . . , J — 1}) F s (x{) - F s {x{ +1 ) > ?-\\x{ +1 - x{\\ 2 (34) 



where rj > is the smallest eigenvalue of fiH T H + 2Vq Vq. 



Proof. Let k G N and j G {0, . . . , J — 1}. According to (f2"0"j) and the definition of ^ 



fkM) - Qk(u{ + \ui) = V M ui) T (u{ +1 - u{). (35) 



Furthermore, q k (u{ +1 , u j ) > fk,s(u j k +1 ). Thus, 



fkM) ~ /MK +1 ) > -\vfkA^) T « +1 ~ <)■ (36) 



The last inequality also reads 



F s (x{) - F s (x{ +1 ) > -I V / fc , 5 K) T K +1 " <)■ (37) 



2 

So, using (|26|) and (f27l) . 



F 5 K) - F 5 (^ +1 ) > -(D k (u{ +1 - u{)) T A(x{)D k (u{ +l - u{) (38) 
>!l\\D k (u{ +1 -u{)\\ 2 . (39) 

In the latter inequality, we make use of the fact that, since Kerii" n KerVo = {0}, r\ is positive, 
and 

(Vsb G R^)(Vv G R^) ^ T y4(«)^ > 7?|H| 2 . (40) 

□ 

Lemma 4. Under Assumptions [7] and [3], £/ie MM subspace iterates are such that 

(Vfc G N)(Vj G {0, . . . , J — 1}) - xjH < ||9 J fc ||, (41) 

where r/ > is the same constant as in Lemma\^ 

Proof. According to (|27|) . we have, for every k G N and j G {0, . . . , J — 1}, 

Djgi + DjA(xi)D k (u{ +1 - u{) = 0. (42) 

Hence, 

(D k (u{ +1 - u{)) T g{ + (D k (u{ +1 - u{)) T A{x k )D k (u{ +1 - u{) = 0. (43) 
By using flU, (@3]) leads to 

-(D k (u{ +1 - u{)) T gi > V \\D k (ui +1 - u{)f. (44) 
In addition, the Cauchy-Schwarz inequality leads to 

-(D k (u{ +1 - u{)) T gl < \\g{\\\\D k (u{ +1 - u{)\\. (45) 
Thus, the latter two inequalities yield: 

V \\D k (u{ +1 - ui)f < \\gi\\\\D k (u{ +1 -u{)\\. (46) 

Substituting with (|30p . obtaining the desired result is straightforward. □ 



4.2 Convergence theorem 



12 



Based on the two previous lemmas, classical results in the optimization literature [17] may allow 
us to deduce the convergence of the sequence (x k ) k ^ generated by the MM subspace algorithm, 
but these results require restrictive conditions on the critical points of the objective function F$. 
We propose here a more general approach based on recent results in non-convex optimization 
[31 HI [5] . We first recall the following definition from [3D] : 

Definition 1. A differentiable function G: M. N — > R is said to satisfy the Kurdyka-Lojasie- 
wicz inequality if, for every x E R^ and every bounded neighborhood Eofx, there exist three 
constants k > 0, £ > and 9 £ [0, 1) such that 

\\VG(x)\\>K\G(x)-G(x)f, (47) 

for every x £ E such that \G(x) — G{x)\ < 

The interesting point is that this inequality is satisfied for a wide class of functions. In 
particular, it holds for real analytic functions, semi-algebraic functions and many others |11[ 



x 



\12[ [351 00]. Recall that a function G : R — > R is semi- algebraic if its graph {(x,n) £ ' 
rj = G(x)} is a semi-algebraic set, i.e. it can be expressed as a finite union of subsets of R^ x R 
defined by a finite number of polynomial inequalities. The semi-algebraicity property is stable 
under various operations (sum, product, inversion, composition,...). Examples of semi-algebraic 
functions include x i— > \\Hx — y\\ 2 , when the functions (V ; s,5)i<s<5 are given by Example Q^ii)! 
or [ ^v)| the squared distance to a closed convex semi-algebraic set. In turn, examples of real- 
analytic functions include x i— > \\Hx — y\\ 2 and ^ when the functions (ip s ,s)i<s<s are given 
by Examples [^ii)H^iv)| Note that a more general local version of inequality (|47p can also be 



found in the literature [12] . 

Let us now state our main convergence result: 

Theorem 3. Assume that F$ satisfies the Kurdyka-Lojasiewicz inequality. Under Assump- 
tions [H El and the MM subspace algorithm given by ([29]) generates a sequence (a^fceN con- 
verging to a critical point x of F$. Moreover, this sequence has a finite length in the sense that 

+0O 

^2 \\x k+ i - x k \\ < +oo. (48) 

fc=0 

Proof. As (i 7 5(a3fc))fc6N is a decreasing sequence and lev <Fs ^ Xo ^ = {a; € M. N \Fs(x) < Fs(xq)} 
is a bounded set (by virtue of Proposition D[p)j) , the sequence (xk)k&N belongs to a compact 
subset E of R^. Hence, there exists a subsequence (x^)^ of (ajfc)fceN converging to a vector 
x of M. N . Besides, since F$ is a continuous function, (^(a^J)^ converges to F$(x). As 
(Fs(x/ { ))f i .^ is decreasing, and Proposition [|pj| shows that it is bounded below, we deduce that 
(Fs(xk) — Fs(x))keN is a nonnegative sequence converging to 0. 

Now, by invoking Lemma [2] (with j = J), we have that, for every k £ N, 

2 

^-'Hf/fcll 2 < Fs(x k ) - F 5 {x k+1 ) = F 5 (x k ) - F s (x) - {F 5 (x k+1 ) - F s (x)). (49) 

According to the Lojasiewicz property, there exist constants k > 0, £ > and 9 £ [0, 1) such 
that 

\\VFs(x)\\>K\Fs(x)-F s (x)\ e , (50) 



for every x £ E such that \F$(x) — Fg(x)\ < £. Let us now apply to the convex function" 
(p: [0, +00) — > [0, +00) : u 1 — y -u 1 ^ 1 "^, the gradient inequality 

(\f(u,v) £ [0,+oo) 2 ) > ip{u) +<p(u)(v - u) (51) 

which, after a change of variables, can be rewritten as 

(V(u,v) £ [0,+oo) 2 ) u-v< {l-e^u^u 1 - 9 -v l ~ e ). (52) 

Combining the latter inequality with (|49p leads to 

Ff(aj fc ) - - (F 4 (aj fc+1 ) - F s (x)) < (1 - ^(^(a*) - F^'A* (53) 

where 

A fc = (F 5 (x k ) - Fsix)) 1 ' - (F s (x k+1 ) - F 5 {x)) l - d . (54) 

Thus, 

2 

\\9k\\ 2 < -^u(l " e)" 1 ^^) - Fi(i))°A fc . (55) 
7o 

Since {F${xk))k&i converges to Fs(x), there exists k* £ N, such that, for every k > k* , < 
Fs(x k ) — Fs(x) < By applying the Lojasiewicz inequality, 

(VA:>A;*) \\g k \\ 2 < ^vk~\\ - 9)-%^ A k . (56) 

This allows us to deduce that 

+00 2 

]T \\g k \\ < ^VK-^l-ey^Fsix^-Fsix)) 1 - 6 . (57) 
fc=fc* 7 o 

Furthermore, according to (f30|) . 

J-l 

2 ll"~^~!~-L "fCII 2 



J|| a;fe+1 - a;fe || 2 = ^||^K +1 -^) (58) 



i=o 

which, by using Lemma [3] and the convexity of the squared norm, yields for every k £ N, 

7 J " 1 

"ll II 2 ^ ^ J II j+1 i||2 

2 ll^fc+i ~ ^fcll S— 2^11*^ _a; fcll 
J-i 

< - F s« +1 ) = J{Fs(xk) ~ F 5 (x k+1 )). (59) 

i=o 

By proceeding similarly to the derivation of (|56|) . we obtain: for every k > k* , 

%\\x k+1 - x k \\ 2 < J(l - ey l (F s {x k ) - F 5 {x)) 9 A k < Jk~\1 - ^-^I^IIAfc. (60) 

By using the fact that, for every (u, v) £ [0, +00) 2 , (to) 1 / 2 < u + j, and taking u = Jr/ _1 K _1 (1 — 
0) -1 A fe and v = 2\\g k \\, ((60j) leads to 

- ajfcll < Jif 1 ^ 1 ^ - ey l /\ k + (61) 



By summing now over k and using (|54|) and (|57p . we finally obtain 
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+00 2 

£ ||aj fc+1 - x fc || < k-^I - OTHJrr 1 + ^oH^Osfc*) - F 4 (x)) 1_fl . (62) 
fc=fc* 7 o 2 

This gives us the desired finite length property. In addition, since this condition implies that 
( x k)keN is a Cauchy sequence, it converges towards a single point, which is necessarily x. 
Finally, due to the continuity of Fg and Lemma[51 (gk)keN converges to zero. As (x^, F$(x k )) — >• 
(x,F§(x)), the closedness property of the gradient implies that VFj(i) = 0, i.e. x must be a 
critical point of Fg. 

□ 

Note that the inexact gradient methods that are studied in [5] are distinct to the subspace 
algorithms we consider. 



5 Simulation results 

The aim of this section is to illustrate and analyze the performance of the proposed algorithm in 
the context of Problem ([I]). We also show the non-convex penalization functions in Example[2]to 
be appropriate for image processing applications. To this end, four image processing problems 
are considered, namely denoising, segmentation, deblurring and tomographic reconstruction. 
For each of them, the produced image x G ~R N is defined as a minimizer of the function F$, 
where 3>, H, y and V depend on the considered application. For the elastic net regularization 
term, we choose Vo = rJ, r > 0. For deblurring and tomographic applications, the linear 
operator H is not necessarily injective. Thus, we set r equal to a small positive value in order 
to fulfill Assumption D jpn") In the two other cases, r is set to zero. 



For every s G {1, . . . , S}, we have set c s = 0. For the potential function i/j s j, we have 
tested the smooth convex £2 — h function %j)s,S'- t ^ A( v / 1 + 1 2 /5 2 — 1) with A > (SC) and 
the smooth non-convex functions in Example [Jn) (SNC(ii)), Example [Wm)] (SNC(iii)), Ex- 



ample BKiv)| (SNq(iv)D and Example $y)\ (SNq(v)j). Moreover, in the case of denoising and 



segmentation examples, we provide optimization results for four state-of-the-art combinatorial 
optimization algorithms, namely the a-expansion |13j (a-EXP), Quantized-Convex Move Split- 
ting [33] (QCSM), Tree-Reweighted (TRW) [S] and Belief Propagation (BP) [26] algorithms, 
for which the nonsmooth non-convex truncated quadratic function in Example [ ^i)| (NSNC) is 
considered. When the linear degradation operator is not the identity matrix, we do not provide 
any comparison with the combinatorial algorithms. Indeed, although a few algorithms 1413] 
are applicable to inverse problems involving a linear degradation operator, these methods are 
well-founded only for a sparse convolution operator H. Moreover, they rely on an adaptation 
of the graph cut a-expansion algorithm, which is shown in our segmentation and denoising 
examples to be outperformed by our proposed approach. 

The computation of the proposed MM subspace algorithm requires specifying the direction 
set -Dfc, for every k € N, and the number of MM sub-iterations J. First, the memory-gradient 
direction matrices, 

(VA; > m) D k = [-g k \ x k - x h - X | • • • | x k _ m+1 - x k _ m ] G M Wx ( m+1 ', (63) 

with memory parameter m > 0, is considered. Moreover, in all our experiments, we set J = 1. 
This choice was observed to yield the best results in terms of convergence profile in the context of 
MM-based stepsize computation [17} 136]. In the following, we compare our proposed subspace 
algorithm, denoted hereafter by MM-MG-m (for Majorize- Minimize Memory Gradient) with 



three other iterative first order descent methods. The methods we compare against are namely 
the nonlinear conjugate gradient (NLCG) algorithm [29], the L-BFGS algorithm |39j with the 
memory parameter set to 3, and the fast version of half quadratic (HQ) algorithm PQ. For each 
descent algorithm, the MM scalar line search with J = 1 is employed for the computation of the 
stepsize. In the case of HQ, the inner optimization problems are solved partially with conjugate 
gradient iterations. Note that this algorithm has been previously studied in the context of 
non-convex regularization functions in [2Q|, [5T] . In order to limit the influence of possible local 
minima in the non-convex case, the result of 10 iterations of convex minimization using an £2 — £\ 
penalty is employed as an initialization. In the convex case, minimization is started with the 
constant null image. The computational complexity is evaluated in terms of iteration number 
and computational time necessary to achieve the global stopping rule 1 1 flTfe 1 1 / x/^V < 10~ 4 . C++ 
codes were compiled with the Intel C++ compiler icpc (version 12.1.0) and were run on an 
Intel(R) Xeon(R) CPU X5570 at 2.93GHz, in a single thread. 



5.1 Image denoising 

The first problem considered in this section corresponds to the recovery of an image x from 
noisy observations u = x + w where w is a realization of a zero-mean white Gaussian noise. 
The vector x here corresponds to Word image of size N = 128 x 128 pixels. The variance of 
the noise was adjusted to correspond to a signal-to-noise ratio (SNR) of 15 dB (Fig. [2]). The 
recovery of the original image is performed by solving ([1]) where Q = 2N, 



H 



I ' 




u 


I 


y = 






(64) 



and 



(\Jz = (z q ) 



q)l<q<2Nj 



<&(*) 




z 2 a +P 



2N 

£ 

q=N+l 



(65) 



where ds denotes the distance to the closed convex interval B = [0, 255] and (3 > is a weighting 
factor. Then, is Lipschitz differentiable with Lipschitz constant L = max(l,/3). In the sequel, 
we choose /3 = 1 so that we have L = 1. Moreover, the penalization term ([3]) is used, with r = 
and an anisotropic penalization on neighboring pixels i.e., S = 27V, and for every s6{l,..., N} 
(resp. s G {iV + 1, . . . , 2N}), P s = 1 and V s corresponds to a horizontal (resp. vertical) gradient 
operator. This anisotropic term is chosen so as to compare more fairly our approach with the 
combinatorial methods. 

Parameters A and 5 were assessed to maximize the SNR between the original image and its 
reconstructed version. In Fig. the reconstructed images are displayed and the corresponding 
SNR and MSSIM |59j values are provided. Morever, the absolute values of the reconstruction 
errors x — x are illustrated. It should be noticed that the non-convex regularization strategy 



with penalty function SNC (ii) leads to the best results in terms of reconstruction quality. 



5.1.1 Influence of memory size 

We first analyze the effect of the memory size m on the performance of our algorithm. We 
recall that the detailed performance analysis of MM-MG algorithm with respect to the size 
of the memory was provided in [T7], but it was restricted to the convex case. The results in 
Tab. [2] illustrate that the choice where memory equals one, which corresponds to a subspace 
with size 2, leads to the best results in terms of computational time. Hence, our experiments 
confirm the conclusions drawn in [T7] for the convex case. Consequently, the setting m = 1, 
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Figure 2: Original image with 128 x 128 pixels (a) and noisy image with SNR= 15 dB, MSSIM 
= 0.66, noise standard deviation equal to 10 (6). 
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Figure 3: Denoising results and absolute reconstruction error with SC penalty using MM-MG, 
A = 0.3, 5 = 0.07, SNR = 20.41 dB, MSSIM = 0.89 (a), with NSNC penalty using TRW, 



A = 350, 5 = 3.5, SNR = 22.8 dB, MSSIM = 0.93 (b) and with SNCp)] penalty using MM-MG, 
A = 280, 5 = 7.25, SNR = 22.74 dB, MSSIM = 0.92 (c). 



i.e. D k = [— g k | x k — Xfc-i] for all k > 1 has been retained for the remaining experiments' 
presented in the paper, and the shorter notation MM-MG is employed for denoting the MM- 
MG-1 algorithm. 
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Table 2: Denoising problem with word image. Influence of memory parameter m in MM-MG 
algorithm. 

5.1.2 Comparison with NLCG algorithm 

The NLCG algorithm is based on the following iterations: 

(Vfc>l) x k+1 = x k + a k (-g k + (3 k (x k - £c fe _i)), (66) 

where a k > is the stepsize and G R is the conjugacy parameter. Tab. [3] summarizes the 
performances of NLCG for five different conjugacy strategies described in |29j . Contrary to 
the convex case, in the non-convex case the conjugacy formula has a major influence on the 
convergence speed (see Tab. [3] results related to NLCG in rows 1-6 and 7-30). In particular the 
conjugacy strategies FR and DY do not appear well-adapted to the non-convex problems. On the 



other hand, the HS, LS and PRP+ conjugacy parameters yield a good numerical performance 
Thus, they have been selected for the numerical experiments in the following. For comparison, 
we include in Tab. 0the results of MM-MG for m = 1. Although the superiority of MM-MG 
versus NLCG is not established theoretically, these experimental results are very promising. 
They show that MM-MG algorithm is faster than the considered non- linear conjugate gradient 
algorithms. 
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Table 3: Denoising problem with word image. Influence of conjugacy parameter /3& in NLCG 
algorithm. 



5.1.3 Summary 
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We summarize the results by comparing the performance of continuous and discrete algorithms 
with SC, SNC and NSNC potential functions (see Tab.0]). One can observe that the considered 
discrete optimization algorithms lead to a SNR which is very similar to that obtained with 
smooth non-convex regularization. However, they are more demanding in terms of computa- 
tional time than MM-MG. Thus, we can conclude that the MM-MG algorithm behaves well in 
comparison with the considered continuous and discrete algorithms. 



5.2 Image segmentation 

In the second experiment, we consider the segmentation of Rice image of size N = 256 x 256 
(see Fig. We define the segmented image as a minimizer of F$, where H = I, y identifies 
with the original image and (Vz G R ) &(z) = ^||z|| 2 . The anisotropic penalization term is 
again used with r = for the same reason as earlier. Figs. and [7] illustrate the resulting 



images and their gradient for SC, NSNC and SNC (iii) penalty functions, when regularization 
parameters (A, 5) are tuned in order to obtain the best visual results in terms of segmentation. 
The gradients of the resulting images are evaluated by displaying, for every n G { 1 , . . . , N} , 
G n = \\A n x\\ with A n = [A n A n } T G R 2xN where A n G R N and A^ G R N represent the 
first-order difference operators in the horizontal and vertical directions. Finally, the intensity 
values along the (arbitrarily chosen) 50th line of each image are plotted in Fig. [6] to better 
illustrate the behaviors of the different approaches. 

According to Tab. [51 the best performance in terms of computational time is obtained by 
the MM-MG algorithm with the SC penalty. However, the convex penalization strategy leads 
to poor segmentation results. Indeed, the boundaries of the reconstructed image are smooth 
and the background suffers from staircasing effect. In contrast, the non-convex penalties give 
rise to truly piecewise constant images. The considered algorithms for the truncated quadratic 
penalty lead to segmented images very similar to the one obtained with SNC regularization. 
However, Tab. [5] shows that they are more demanding in terms of computational time than 
MM-MG. 



5.3 Image deblurring 

Our third experiment corresponds to the problem of restoring the montage image x, with size 
256 x 256, from blurred and noisy observations u = Rx + w where it) is a realization of a zero- 
mean white Gaussian noise and R models a linear uniform blur with size 3x3. The recovery 
of the original image is performed by solving ([I]) where Q = 2N, 



H = 



R ' 




u 


I 


y = 






and 

/ TV 2N 

(v* = (z q )x< q < 2N ) Hz) = - E 4 + P E 

\q=l q=N+l 

where <ig denotes the distance to the closed convex interval B = [0,255] and /3 = 0.01. Fur- 
thermore, function ^ is given by ([3j) with r = 10 _ 10 and S = 2N. We consider, for ev- 
ery s G {1, . . . ,N}, an isotropic regularization between neighbooring pixels, i.e., P s = 2 and 
V s = [Aj Al] T where G M. N (resp. AJ G M. N ) corresponds to a horizontal (resp. ver- 
tical) gradient operator, and, for every s G {N + 1, . . . ,2iV}, the Hessian-based penalization 
from [38] i.e., P s = 3 and V s = [Af y/2Af A V S V ] T where Af G R N , A^ v G 1^ and 
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18 


5.33 
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Table 4: Results for the denoising problem. 
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Figure 4: Initial gray level image with 256 x 256 pixels. 



^ "j® 



1*. 








(a) (6) (c) 

Figure 5: Segmented images and their gradient for SC penalty using MM-MG, A = 2, 6 = 0.2 
(a), for NSNC penalty using TRW, 
MM-MG, A = 1500, 6 = 8 (c). 



A = 1550, S = 3.5 (6) and for SNC (hi) penalty using 




Figure 6: Comparison of 50th line of segmented images using SC (thin line), NSNC (crosses) 
and SNC (hi) (thick line) potential functions. The 50th line of the original image is indicated in 
dotted plot. 
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Figure 7: Detail of segmented images and their gradient for SC penalty using MM-MG, A = 2, 
5 = 0.2 (a), NSNC penalty using A = 1550, 5 = 3.5 (b) and for SNCpi)] penalty using MM-MG, 
A = 1500, 5 = 8 (c). 



Penalty functionf A, 5) 


Algorithm 


Iteration 


Time 


Fs 


SC! (9 9) 


J.VX1VX 1VX V_J 


1 39 


9Q 


6 69 • 1 6 

U.Ut? XVJ 




NLCG-HS 


144 


1.49 


6.69 • 10 6 




NLCG-PRP+ 


143 


1.47 


6.69 • 10 6 




NLCG-LS 


148 


1.54 


6.69 • 10 6 




L-BFGS 


215 


3.44 


6.69 • 10 6 




HQ 


898 


18.19 


6.69 • 10 6 


&NC (m) nsno 

kJl > V, I 111 J ( X tJUL/^ Ul 


MM-MG 


491 


3 43 

. Tit J 


1 59-1 7 




NLCG-HS 


1578 


14.93 


1.59 • 10 7 




NLCG-PRP+ 


463 


4.25 


1.59 • 10 7 




NLCG-LS 


598 


5.64 


1.59 • 10 7 




L-BFGS 


632 


9.57 


1.59 • 10 7 




HQ 


3553 


65.2 


1.59 • 10 7 


NSNC (1550,3.5) 


a-EXP 


9 


57.97 


5.58 • 10 6 




QCSM 


1 


7.05 


5.52 • 10 6 




TRW 


5 


6.71 


5.52 • 10 6 




BP 


50 


61.83 


5.52 • 10 6 



23 



Table 5: Results for the segmentation problem. 



AJ V £ W N model the second-order finite difference operators between neighbooring pixels, 
as described in [381 Sec.III-A]. For s G {N + 1,...,2A} we consider the £2 — £\ function 
ips,6 : t ^ p(y/l + 1 2 / {05) 2 — 1), where p and 9 take positive values. Tab. [6] presents the re- 
sults for SC and SNC(ii) regularization of the image gradient (i.e. ijj s> s for s E {1, ...,JV} ). 
Parameters (p, 6, A, 5) are tuned to maximize the SNR of the restored image. In both cases, 
the MM-MG algorithm outperforms the three considered descent algorithms in terms of time 
efficiency. Additionally, the non-convex strategy leads to better results in terms of SNR (see 
Fig. [5]) . One can also observe that in this case the staircasing effect is reduced (see some image 
details in Fig. [8j). 



5.4 Image reconstruction 

In our last experiment, we consider the problem of reconstructing an image x S W N from noisy 
tomographic acquisitions, modeled as 

u = Rx + w, (67) 

where R is the Radon projection matrix whose (r, n) element (l<r<i?, l<n< N) 
models the contribution of the nth pixel to the rth datapoint, and w represents an additive 
noise component. In this example, we consider one slice of the standard Zubal phantom [63] 
with dimensions N = 128 x 128, and R = 46336 measurements from 181 projection lines and 
256 angles. This image is corrupted with a zero-mean independent and identically distributed 
Laplacian noise (SNR = 23.5 dB). Fig. [TT1 shows the original image and its noisy sinogram. 
The reconstruction is performed by minimizing Fs with Q = R + N , 



R ' 




u 


I 


y = 







Figure 9: Deblurring results with SC penalty using MM-MG, p = 0.56, = 0.18, A = 0.042, 
5 = 4.19, SNR = 26.90 dB, MSSIM = 0.94 (a) and with SNCp)] penalty using MM-MG, 
p = 41.55, 6 = 0.86, A = 3.68, 5 = 18.65, SNR = 27.69 dB, MSSIM = 0.94 (6). 



Penalty function^, 6, A, 5) 


Algorithm 


Iteration 


Time 


F s 


SNR 


SC! (0 56 D 1 8 D 049 4 1 9 s ! 


MM-MC4 


1 91 

J. Li ± 


8 36 


8 99 • ID 6 


96 QD 
Li\j . ctyj 




NLCG-HS 


121 


8.92 


8.22 • 10 6 


26.90 




NLCG-PRP+ 


129 


9.32 


8.22 • 10 6 


26.90 




NLCG-LS 


131 


9.51 


8.22 • 10 6 


26.90 




L-BFGS 


162 


12.42 


8.22 • 10 6 


26.90 




HQ 


418 


94.3 


8.22 • 10 6 


26.90 


SNCfii") Ml 55 D 86 3 68 18 65"l 


MM-MG 


1 96 


1 1 58 


7 99 • 1 n 6 


97 69 

Li 1 . \JtJ 




NLCG-HS 


243 


15.93 


7.92 • 10 6 


27.69 




NLCG-PRP+ 


221 


14.41 


7.92 • 10 6 


27.69 




NLCG-LS 


246 


15.62 


7.92 • 10 6 


27.69 




L-BFGS 


216 


14.78 


7.92 • 10 6 


27.69 




HQ 


616 


104.9 


7.92 • 10 6 


27.69 



25 



Table 6: Results for the deblurring problem. 



and 

' R 



(Vz = (^)!< 3 < Q ) Hz) = \ E^/1 + (VP) 2 + ^ E d2 B^) ( 69 ) 

\g=l q=R+l J 

with B = [0,255]. Thus, $ has a Lipschitz gradient with constant L = max(-^ , (3) . In the 
sequel, we take f3 = 10~ 2 . Furthermore, the regularization function 0, with t = 10 -10 and 
an isotropic edge-preserving penalty is considered i.e., S = N and, for every s £ {1, . . . , N}, 
P s = 2 and V s = [A* A V S ] T where A^ £ 1^ (resp. A v s £ R N ) corresponds to a horizontal 
(resp. vertical) gradient operator. 



Fig. UU shows the results obtained for penalization strategies SC and SNC (ii) with (A, 5, p) 
tuned to maximize the SNR of the restored image. We emphasize that the smooth non-convex 
penalty leads to better results in terms of reconstruction quality. In particular, it appears to 
be well-suited to the reconstruction of the boundaries of the image, as demonstrated in Fig. [T2l 
Tab. [7] illustrates the performance of the MM-MG algorithm, in comparison with the three 



tested descent algorithms, when either the SC or the SNC (ii) penalty function is used. In this 
example, the proposed algorithm outperforms the others, in terms of both iteration number and 
computational time. In the non-convex case, because of the presence of local minimizers, the 
four algorithms do not lead to the same final SNR value. It can be noticed that the smallest 
final criterion value is obtained with the MM-MG algorithm. 



6 Conclusion 

In this work, we have considered a class of smooth non-convex regularization functions and we 
have proposed an efficient minimization strategy for solving the associated variational problems 
in imaging applications. Connections with £q penalized problems have been shown asymptoti- 
cally. In addition, a novel convergence proof of the proposed subspace MM algorithm relying on 
the Kurdyka-Lojasiewicz inequality has been given. Numerical experiments have been carried 
out to compare the proposed approach with other state-of-the art continuous optimization meth- 
ods (both for non-convex and convex penalizations) and with discrete optimization approaches 
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(a) 



(b) 



Figure 11: Reconstructed image using SC penalty function with MM-MG, A = 0.06, 5 = 2.9, 
p = 1.6, SNR = 18.05 dB, MSSIM = 0.81 (a) or using SNCp)] penalty function with MM-MG, 
A = 1.2, <5 = 11.1, p = 2.2, SNR = 21.13 dB, MSSIM = 0.92 (6). 






(a) 



(6) 



(c) 



Figure 12: Detail of the original image (a) and corresponding reconstructions with convex 
penalty function (b) and non-convex penalty function (c). 
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Table 7: Results for the tomography problem. 



dealing with a truncated quadratic penalization. In the four presented image processing exam- 
ples, we argue that the proposed approach constitutes an appealing alternative to the existing 
methods in terms of recovered image quality and computational time. 
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